Skip to content

Conversation

@BryanFauble
Copy link
Contributor

@BryanFauble BryanFauble commented Sep 8, 2025

Problem:

  • When upgrading the AWS Cluster to 1.33 it required an upgrade to the GuardDuty addon as the 1.8 version we had is no longer supported on EKS 1.33.
  • When upgrading the GuardDuty addon to 1.11 it ran into this error:
2025-09-08T18:51:01.965044Z  INFO amzn_guardduty_agent: Set the log file retention from default value to 12.0 hour.              
2025-09-08T18:51:01.965066Z  WARN amzn_guardduty_agent: Found a issue with cleaning obsolete logs, IoError(Os { code: 2, kind: NotFound, message: "No such file or directory" })                                                                                  
2025-09-08T18:51:01.965084Z  INFO amzn_guardduty_agent: GuardDuty agent starting with 8 worker thread(s) and 100 max blocking threads.                                                                                                                            
2025-09-08T18:51:01.991602Z  INFO amzn_guardduty_agent: Agent fingerprint: f028c51861c3b6e2f4f0e2ad1d834e6f592c74290fff8803000a2dd44f86facf                                                                                                                       
2025-09-08T18:51:01.991767Z  INFO amzn_guardduty_agent: Agent config for Component(s): AgentConfig { component_channel_size: 100, ingestion_endpoint: None, generate_test_messages: false, test_messages_size: 1000, integrity_check_file: None, agent_version: Some("f028c51861c3b6e2f4f0e2ad1d834e6f592c74290fff8803000a2dd44f86facf"), raw_os_version: "Amazon_Linux_2023.8.20250818", os_version: OSUnknown, kernel_version: LinuxKernel6_12, region: None, stage: None, agent_id: "6fd1a480-51b9-4114-b8e4-0de5c2b76b9f", agent_publish_metrics_to_end_point_period: 300s, process_cache_capacity: 16384, container_cache_capacity: 16384, pod_cache_capacity: 16384, task_cache_capacity: 16384, pid_namespace_cache_capacity: 1024, pre_suppressor_cache_capacity: 32768, post_suppressor_cache_capacity: 32768, docker_socket_file_path: "/agent_run/docker.sock", containerd_socket_file_path: "/agent_run/containerd/containerd.sock", scan_for_process_metadata_period: 900s, scan_for_container_metadata_period: 600s, scan_for_pod_metadata_period: 600s, scan_for_task_metadata_period: 600s, eks_cluster_name_env_var: "CLUSTER_NAME", max_file_size_to_hash: 536870912, proc_folder_path: "/host_proc", agent_runtime_environment: EksEc2, ecs_fargate_endpoint_env_var: "ECS_CONTAINER_METADATA_URI_V4", ecs_ec2_endpoint: "http://localhost:51678", log_file_retention: 12.0, agent_log_dir: "", file_hash_calculation_timeout: 5s, file_open_timeout: 1.5s, mnt_ns_pids_map_capacity: 1024, pid_set_cap: 32 }  
2025-09-08T18:51:02.009078Z  INFO amzn_guardduty_agent::dependency_checks: Dependency checks complete.                           
2025-09-08T18:51:02.012551Z  INFO amzn_guardduty_agent_data_model::schema: Event schema rabin fingerprint = "3c2b59453e07bb98"   
2025-09-08T18:51:02.012698Z  INFO amzn_guardduty_agent_data_model::schema: Container schema rabin fingerprint = "5a675cafcf525a5d"                                                                                                                                
2025-09-08T18:51:02.012772Z  INFO amzn_guardduty_agent_data_model::schema: Pod schema rabin fingerprint = "8504ac7492b3cb8b"     
2025-09-08T18:51:02.012853Z  INFO amzn_guardduty_agent_data_model::schema: Task schema rabin fingerprint = "195a73fa36862425"    
2025-09-08T18:51:02.015455Z  INFO amzn_guardduty_agent: GuardDuty agent started ...                                              
2025-09-08T18:51:02.015571Z  INFO amzn_guardduty_agent: Type Ctrl+C to terminate                                                 
2025-09-08T18:51:02.126934Z ERROR amzn_guardduty_agent_publisher::publisher: Agent exits due to AccessDeniedException            
2025-09-08T18:51:02.128040Z  INFO amzn_guardduty_agent: Agent receives SHUTDOWN signal from Publisher                            
2025-09-08T18:51:02.128055Z  INFO amzn_guardduty_agent: Shutdown initiated ...                                                   
2025-09-08T18:51:02.128107Z  INFO amzn_guardduty_agent::controller: Controller terminated ...                                    
2025-09-08T18:51:02.128227Z  INFO amzn_guardduty_agent: GuardDuty agent terminated                                               
2025-09-08T18:51:02.128275Z  INFO amzn_guardduty_agent_metrics::metrics_manager: Metrics Manager terminated ...  
  • In order to further debug the issue I was able to edit the environment variables in the Daemonsets kubernetes resource and an item called aws-guardduty-agent by adding RUST_LOG to debug:
image

This showed me the following error in the logs:

"{\"Message\":\"User: arn:aws:sts::631692904429:assumed-role/aws:ec2-instance/i-03dda5ce8a9fab5d4 is not authorized to perform: guardduty:SendSecurityTelemetry on resource: arn:aws:guardduty:us-east-1:631692904429:detector/56c6fe1e2fec16bf1cbf75c4fbf90386 with an explicit deny in a VPC endpoint policy\"}",

This error then made me verify the policy for the VPC endpoint, which led me to find https://docs.aws.amazon.com/guardduty/latest/ug/eksrunmon-prereq-deploy-security-agent.html which shows that I had the incorrect configuration value

Solution:

  • Correct the configuration value

Testing:

  • Will test in the sandbox cluster before moving up

@BryanFauble BryanFauble changed the title [DPE-1424] Update AWS GuardDuty addon to use an IAM/IRSA [DPE-1424] Update AWS GuardDuty addon VPCE policy Sep 8, 2025
Copy link
Contributor

@thomasyu888 thomasyu888 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥 LGTM!

@BryanFauble BryanFauble merged commit 214a5b1 into main Sep 10, 2025
7 checks passed
@BryanFauble BryanFauble deleted the dpe-1423-upgrade-guardduty-addon branch September 10, 2025 15:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants